Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

CHAPTER 17 More of a Good Thing: Multiple Regression 247

meaning models with the same outcome variable, but different groups of predic-

tors. You also use some sort of strategy in choosing the order in which you intro-

duce the predictors into the iterative models, which is described in Chapter 20. So

imagine that you used our example data set and — in one iteration — ran a model

to predict SBP with Age and other predictors in it, and the coefficient for Age was

statistically significant. Now, imagine you added Weight to that model, and in the

new model, Age was no longer statistically significant! You’ve just been visited by

the collinearity fairy.

In the example from Table 17-2, there’s a statistically significant positive correla-

tion between each predictor and the outcome. We figured this out when running

the correlations for Figure 17-1, but you could check our work by using the data in

Figure 17-2 in a straight-line regression, as described in Chapter 16. In contrast,

the multiple regression output in Figure 17-2 shows that neither Age nor Weight

are statistically significant in the model, meaning neither has regression coeffi-

cients that are statistically significantly different from zero! Why are they associ-

ated with the outcome in correlation but not multiple regression analysis?

The answer is collinearity. In the regression world, the term collinearity (also

called multicollinearity) refers to a strong correlation between two or more of the

predictor variables. If you run a correlation between Age and Weight (the two pre-

dictors), you’ll find that they’re statistically significantly correlated with each

other. It is this situation that destroys your statistically significant p value seen on

some predictors in iterative models when doing multiple regression.

The problem with collinearity is that you cannot tell which of the two predictor

variables is actually influencing the outcome more, because they are fighting over

explaining the variability in the dependent variable. Although models with col-

linearity are valid, they are hard to interpret if you are looking for cause-and-

effect relationships, meaning you are doing causal inference. Chapter 20 provides

philosophical guidance on dealing with collinearity in modeling.

Calculating How Many Participants

You Need

Studies should target enrolling a large enough sample size to ensure that you get

a statistically significant result for your primary research hypothesis in the case

that the effect you’re testing in that hypothesis is large enough to be of clinical

importance. So if the main hypothesis of your study is going to be tested by a

multiple regression analysis, you should theoretically do a calculation to deter-

mine the sample size you need to support that analysis.